<!--begin.rcode results='hide', echo=FALSE, message=FALSE

library(caret)
data(BloodBrain)
theme1 <- caretTheme()
theme1$superpose.symbol$col = c(rgb(1, 0, 0, .4), rgb(0, 0, 1, .4), 
  rgb(0.3984375, 0.7578125,0.6445312, .6))
theme1$superpose.symbol$pch = c(15, 16, 17)
theme1$superpose.cex = .8
theme1$superpose.line$col = c(rgb(1, 0, 0, .9), rgb(0, 0, 1, .9), rgb(0.3984375, 0.7578125,0.6445312, .6))
theme1$superpose.line$lwd <- 2
theme1$superpose.line$lty = 1:3
theme1$plot.symbol$col = c(rgb(.2, .2, .2, .4))
theme1$plot.symbol$pch = 16
theme1$plot.cex = .8
theme1$plot.line$col = c(rgb(1, 0, 0, .7))
theme1$plot.line$lwd <- 2
theme1$plot.line$lty = 1

    hook_inline = knit_hooks$get('inline')
    knit_hooks$set(inline = function(x) {
      if (is.character(x)) highr::hi_html(x) else hook_inline(x)
      })
    opts_chunk$set(comment=NA, digits = 3)

session <- paste(format(Sys.time(), "%a %b %d %Y"),
                 "using caret version",
                 packageDescription("caret")$Version,
                 "and",
                 R.Version()$version.string)
    end.rcode--> 

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
  <!--
  Design by Free CSS Templates
http://www.freecsstemplates.org
Released for free under a Creative Commons Attribution 2.5 License

Name       : Emerald 
Description: A two-column, fixed-width design with dark color scheme.
Version    : 1.0
Released   : 20120902

-->
  <html xmlns="http://www.w3.org/1999/xhtml">
  <head>
  <meta name="keywords" content="" />
  <meta name="description" content="" />
  <meta http-equiv="content-type" content="text/html; charset=utf-8" />
  <title>Recursive Feature Elimination</title>
  <link href='http://fonts.googleapis.com/css?family=Abel' rel='stylesheet' type='text/css'>
  <link href="style.css" rel="stylesheet" type="text/css" media="screen" />
  </head>
  <body>
  <div id="wrapper">
  <div id="header-wrapper" class="container">
  <div id="header" class="container">
  <div id="logo">
  <h1><a href="#">Recursive Feature Elimination</a></h1>
</div>
  <!--
  <div id="menu">
  <ul>
  <li class="current_page_item"><a href="#">Homepage</a></li>
<li><a href="#">Blog</a></li>
<li><a href="#">Photos</a></li>
<li><a href="#">About</a></li>
<li><a href="#">Contact</a></li>
</ul>
  </div>
  -->
  </div>
  <div><img src="images/img03.png" width="1000" height="40" alt="" /></div>
  </div>
  <!-- end #header -->
<div id="page">
  <div id="content">

<h1>Contents</h1>  
<ul>
  <li><a href="#search">Feature Selection Using Search Algorithms</a></li>
  <li><a href="#resamp">Resampling and External Validation</a></li> 
  <li><a href="#rfe">Recursive Feature Elimination via <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a></a></li> 
 </ul>   
     
<p><b><i>Backwards Selection</p></i></b>

<p>
First, the algorithm fits the model to all predictors. Each predictor is ranked using it's importance to the model. Let <i>S</i> be a sequence of ordered numbers which are candidate values for the number of predictors to retain (<i>S<sub>1</sub></i> &gt <i>S<sub>2</sub></i>, ...). At each iteration of feature selection, the <i>S<sub>i</sub></i> top ranked predictors are retained, the model is refit and performance is assessed. The value of <i>S<sub>i</sub></i> with the best performance is determined and the top <i>S<sub>i</sub></i> predictors are used to fit the final model. Algorithm 1 has a more complete definition.
</p>

<p>
The algorithm has an optional step (line 1.9) where the predictor rankings are recomputed on the model on the reduced feature set. <a href ="http://rd.springer.com/chapter/10.1007%2F978-3-540-25966-4_33">Svetnik <i>et al</i> (2004)</a> showed that, for random forest models, there was a decrease in performance when the rankings were re-computed at every step. However, in other cases when the initial rankings are not good (e.g. linear models with highly collinear predictors), re-calculation can slightly improve performance.
</p>

<p><br><img width = 686 height =379 src="Algo1.png"><br><br></p>


<p>
One potential issue over-fitting to the predictor set such that the wrapper procedure could focus on nuances of the training data that are not found in future samples (i.e. over-fitting to predictors and samples). 
</p>

<p>
For example, suppose a very large number of uninformative predictors were collected and one such predictor randomly correlated with the outcome. The RFE algorithm would give a good rank to this variable and the prediction error (on the same data set) would be lowered. It would take a different test/validation to find out that this predictor was uninformative. The was referred to as "selection bias" by <a href ="http://www.pnas.org/content/99/10/6562.short">Ambroise and McLachlan (2002)</a>.
</p>

<p>
In the current RFE algorithm, the training data is being used for at least three purposes: predictor selection, model fitting and performance evaluation. Unless the number of samples is large, especially in relation to the number of variables, one static training set may not be able to fulfill these needs.
</p>

<div id="resamp"></div>
<h1>Resampling and External Validation</h1>

<p>
Since feature selection is part of the model building process, resampling methods (e.g. cross-validation, the bootstrap) should factor in the variability caused by feature selection when calculating performance. For example, the RFE procedure in Algorithm 1 can estimate the model performance on line 1.7, which during the selection process.  <a href ="http://www.pnas.org/content/99/10/6562.short">Ambroise and McLachlan (2002)</a> and <a href ="http://rd.springer.com/chapter/10.1007%2F978-3-540-25966-4_33">Svetnik <i>et al</i> (2004)</a> showed that improper use of resampling to measure performance will result in models that perform poorly on new samples.
</p>

<p>
To get performance estimates that incorporate the variation due to feature selection, it is suggested that the steps in Algorithm 1 be encapsulated inside an outer layer of resampling (e.g. 10-fold cross-validation). Algorithm 2 shows a version of the algorithm that uses resampling.
</p>

<p>
While this will provide better estimates of performance, it is more computationally burdensome. For users with access to machines with multiple processors, the first <code>For</code> loop in Algorithm 2 (line 2.1) can be easily parallelized. Another complication to using resampling is that multiple lists of the "best" predictors are generated at each iteration. At first this may seem like a disadvantage, but it does provide a more probabilistic assessment of predictor importance than a ranking based on a single fixed data set. At the end of the algorithm, a consensus ranking can be used to determine the best predictors to retain.
</p>

<p><br><img width = 684 height =477 src="Algo2.png"><br><br></p>


<div id="rfe"></div>
<h1>Recursive Feature Elimination via <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a></h1>

<p>In <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a>, Algorithm 1 is implemented by the function <span class="mx funCall">rfeIter</span>. The resampling-based Algorithm 2 is in the <span class="mx funCall">rfe</span> function. Given the potential selection bias issues, this document focuses on <span class="mx funCall">rfe</span>.  There are several arguments:
</p>

<ul>
<li> <span class="mx arg">x</span>, a matrix or data frame of predictor variables</li> 
<li> <span class="mx arg">y</span>, a vector (numeric or factor) of outcomes</li> 
<li> <span class="mx arg">sizes</span>, a integer vector for the specific subset sizes that should be tested (which need not to include <code>ncol(x)</code>) </li> 
<li> <span class="mx funCall">rfeControl</span>, a list of options that can be used to specify the model and the methods for prediction, ranking etc.</li> 
</ul>

<p>
For a specific model, a set of functions must be specified in <code>rfeControl$functions</code>. Sections below has descriptions of these sub-functions. There are a number of pre-defined sets of functions for several models, including: linear regression (in the object <code>lmFuncs</code>), random forests (<code>rfFuncs</code>), naive Bayes (<code>nbFuncs</code>), bagged trees (<code>treebagFuncs</code>) and functions that can be used with <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a>'s <span class="mx funCall">train</span> function (<code>caretFuncs</code>). The latter is useful if the model has tuning parameters that must be determined at each iteration.
</p>

<div id="example"></div>
<h2>An Example</h2>

<!--begin.rcode rfe_load_lib
library(caret)
library(mlbench)
library(Hmisc)
library(randomForest)
    end.rcode--> 

<p>
To test the algorithm, the "Friedman 1" benchmark (Friedman, 1991) was used. There are five informative variables generated by the equation
</p>

<p><img width = 406 height =30 src="FEq.png"></p>


<p>
In the simulation used here:
</p>

<!--begin.rcode rfe_load_sim,tidy=FALSE
n <- 100
p <- 40
sigma <- 1
set.seed(1)
sim <- mlbench.friedman1(n, sd = sigma)
colnames(sim$x) <- c(paste("real", 1:5, sep = ""),
                     paste("bogus", 1:5, sep = ""))
bogus <- matrix(rnorm(n * p), nrow = n)
colnames(bogus) <- paste("bogus", 5+(1:ncol(bogus)), sep = "")
x <- cbind(sim$x, bogus)
y <- sim$y
    end.rcode-->  

<p>
Of the <!--rinline I(ncol(x)) --> predictors, there are <!--rinline I(ncol(x)-5) --> pure noise variables: 5 are uniform on [0, 1] and <!--rinline I(p) --> are random univariate standard normals. The predictors are centered and scaled:
</p>

<!--begin.rcode rfe_load_pp
normalization <- preProcess(x)
x <- predict(normalization, x)
x <- as.data.frame(x)
subsets <- c(1:5, 10, 15, 20, 25)
    end.rcode-->  

<p>
The simulation will fit models with subset sizes of <!--rinline I(paste(sort(subsets, decreasing = TRUE), collapse = ", ")) -->. 
</p>

<p>
As previously mentioned, to fit linear models, the <code>lmFuncs</code> set of functions can be used. To do this, a control object is created with the <span class="mx funCall">rfeControl</span> function. We also specify that repeated 10-fold cross-validation should be used in line 2.1 of Algorithm 2. The number of folds can be changed via the <span class="mx arg">number</span> argument to <span class="mx funCall">rfeControl</span> (defaults to 10). The <span class="mx arg">verbose</span> option prevents copious amounts of output from being produced.
</p>

<!--begin.rcode rfe_load_lmProfile,tidy=FALSE,cache=TRUE
set.seed(10)

ctrl <- rfeControl(functions = lmFuncs,
                   method = "repeatedcv",
                   repeats = 5,
                   verbose = FALSE)

lmProfile <- rfe(x, y,
                 sizes = subsets,
                 rfeControl = ctrl)

lmProfile
    end.rcode-->  


<p>
The output shows that the  best subset size was estimated to be <!--rinline I(lmProfile$bestSubset ) --> predictors. This set includes informative variables but did not include them all. The <span class="mx funCall">predictors</span> function can be used to get a text string of variable names that were picked in the final model. The <code>lmProfile</code> is a list of class <tt><!--rinline '"rfe"' --></tt> that contains an object <code>fit</code> that is the final linear model with the remaining terms. The model can be used to get predictions for future or test samples. 
</p>

<!--begin.rcode rfe_load_lmProfile_out1
predictors(lmProfile)

lmProfile$fit
head(lmProfile$resample)
    end.rcode-->  

<p>
There are also several plot methods to visualize the results. <tt><!--rinline 'plot(lmProfile)' --></tt> produces the performance profile across different subset sizes, as shown in the figure below. 
</p>

<!--begin.rcode rfe_lmprofile,fig.width=8,fig.height=4
trellis.par.set(caretTheme())
plot(lmProfile, type = c("g", "o"))
    end.rcode--> 

<p>
Also the resampling results are stored in the sub-object <code>lmProfile$resample</code> and can be used with several lattice functions. Univariate lattice functions (<span class="mx funCall">densityplot</span>, <span class="mx funCall">histogram</span>) can be used to plot the resampling distribution while bivariate functions (<span class="mx funCall">xyplot</span>, <span class="mx funCall">stripplot</span>) can be used to plot the distributions for different subset sizes. In the latter case, the option <span class="mx arg">returnResamp</span><tt> = <!--rinline '"all"' --></tt> in <span class="mx funCall">rfeControl</span> can be used to save all the resampling results. Example images are shown below for the random forest model.
</p>

<h2>Helper Functions</h2>

<p>
To use feature elimination for an arbitrary model, a set of functions
must be passed to <span class="mx funCall">rfe</span> for each of the steps in Algorithm
2. 
</p>

<p>
This section defines those functions and uses the
existing random forest functions as an illustrative
example. <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a> contains a list called <code>rfFuncs</code>, but
this document will use a more simple version that will be better for
illustrating the ideas. A set of simplified functions used here and called <code>rfRFE</code>.
</p>

<!--begin.rcode rfe_rfealt,tidy=FALSE
rfRFE <-  list(summary = defaultSummary,
               fit = function(x, y, first, last, ...){
                 library(randomForest)
                 randomForest(x, y, importance = first, ...)
                 },
               pred = function(object, x)  predict(object, x),
               rank = function(object, x, y) {
                 vimp <- varImp(object)
                 vimp <- vimp[order(vimp$Overall,decreasing = TRUE),,drop = FALSE]
                 vimp$var <- rownames(vimp)                  
                 vimp
                 },
               selectSize = pickSizeBest,
               selectVar = pickVars)
    end.rcode-->  


<b><p><i>The <span class="mx funCall">summary</span> Function</p></i></b>

<p>
The <span class="mx funCall">summary</span> function takes the observed and predicted values
and computes one or more performance metrics (see line
2.14). The input is a data frame with columns <code>obs</code>
and <code>pred</code>. The output should be a named vector of numeric
variables. Note that the <span class="mx arg">metric</span> argument of the <span class="mx funCall">rfe</span>
function should reference one of the names of the output of
<code>summary</code>. The example function is:
</p>

<!--begin.rcode rfe_summary
rfRFE$summary
    end.rcode--> 


<p>
Two functions in <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a> that can be used as the summary
funciton are <span class="mx funCall">defaultSummary</span> and <span class="mx funCall">twoClassSummary</span> (for
classification problems with two classes). 
</p>


<b><p><i>The <span class="mx funCall">fit</span> Function</p></i></b>

<p>
This function builds the model based on the current data set (lines 2.3, 2.9 and 2.17). The arguments for the function must be:
</p>

<ul>
<li> <span class="mx arg">x</span>: the current training set of predictor data with the appropriate subset of variables</li> 
<li> <span class="mx arg">y</span>: the current outcome data (either a numeric or factor vector)</li> 
<li> <span class="mx arg">first</span>: a single logical value for whether the current predictor set has all possible variables (e.g. line 2.3)</li> 
  <li> <span class="mx arg">last</span>: similar to <span class="mx arg">first</span>, but <span class="mx arg">TRUE</span> when the last model is fit with the final subset size and predictors. (line 2.17) </li> 
<li> <span class="mx arg">...</span>: optional arguments to pass to the fit function in the call to <span class="mx funCall">rfe</span></li> 
</ul>

<p>
The function should return a model object that can be used to generate predictions. For random forest, the fit function is simple:
</p>

<!--begin.rcode rfe_fit
rfRFE$fit
    end.rcode--> 

<p>
For feature selection without re-ranking at each iteration, the random forest variable importances only need to be computed on the first iterations when all of the predictors are in the model. This can be accomplished using  <span class="mx arg">importance</span><tt> = <!--rinline 'first' --></tt>.
</p>

<p><i>The <span class="mx funCall">pred</span> Function</p></i>

<p>
This function returns a vector of predictions (numeric or factors) from the current model (lines 2.4 and 2.10). The input arguments must be
</p>

<ul>
<li> <span class="mx arg">object</span>: the model generated by the <code>fit</code> function</li> 
<li> <span class="mx arg">x</span>: the current set of predictor set for the held-back samples</li> 
</ul>

<p>
For random forests, the function is a simple wrapper for the predict function:
</p>

<!--begin.rcode rfe_pred
rfRFE$pred
    end.rcode--> 

<p>
For classification, it is probably a good idea to ensure that the resulting factor variables of predictions has the same levels as the input data.
</p>

<b><p><i>The <span class="mx funCall">rank</span> Function</p></i></b>

<p>
This function is used to return the predictors in the order of the most important to the least important (lines 2.5 and 2.11). Inputs are:
</p>

<ul>
<li> <span class="mx arg">object</span>: the model generated by the <code>fit</code> function</li> 
<li> <span class="mx arg">x</span>: the current set of predictor set for the training samples</li> 
<li> <span class="mx arg">y</span>: the current training outcomes</li> 
</ul>

<p>
The function should return a data frame with a column called <code>var</code> that has the current variable names. The first row should be the most important predictor etc. Other columns can be included in the output and will be returned in the final <span class="mx funCall">rfe</span> object.
</p>

<p>
For random forests, the function below uses <a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a>'s <span class="mx funCall">varImp</span> function to extract the random forest importances and orders them. For classification, <span class="mx funCall">randomForest</span> will produce a column of importances for each class. In this case, the default ranking function orders the predictors by the averages importance across the classes.
</p>

<!--begin.rcode rfe_rank
rfRFE$rank
    end.rcode--> 

<b><p><i>The <span class="mx funCall">selectSize</span> Function</p></i></b>

<p>
This function determines the optimal number of predictors based on the resampling output (line 2.15). Inputs for the function are:
</p>

<ul>
<li> <span class="mx arg">x</span>: a matrix with columns for the performance metrics and the number of variables, called <code>Variables</code></li> 
<li> <span class="mx arg">metric</span>: a character string of the performance measure to optimize (e.g. RMSE, Accuracy)</li> 
<li> <span class="mx arg">maximize</span>: a single logical for whether the metric should be maximized</li> 
</ul>

<p>
This function should return an integer corresponding to the optimal subset size.
</p>

<p>
<a href="http://cran.r-project.org/web/packages/caret/index.html"><strong>caret</strong></a> comes with two examples functions for this purpose: <span class="mx funCall">pickSizeBest</span> and <span class="mx funCall">pickSizeTolerance</span>. The former simply selects the subset size that has the best value. The latter takes into account the whole profile and tries to pick a subset size that is small without sacrificing too much performance. For example, suppose we have computed the RMSE over a series of variables sizes:
</p>

<!--begin.rcode tolerance,tidy=FALSE
example <- data.frame(RMSE = c(3.215, 2.819, 2.414, 2.144, 
                               2.014, 1.997, 2.025, 1.987, 
                               1.971, 2.055, 1.935, 1.999, 
                               2.047, 2.002, 1.895, 2.018),
                               Variables = 1:16)
    end.rcode-->  

<p>
These are depicted in the figure below. The solid circle identifies the subset size with the absolute smallest RMSE. However, there are many smaller subsets that produce approximately the same performance but with fewer predictors. In this case, we might be able to accept a slightly larger error for less predictors.
</p>

<p>
The <span class="mx funCall">pickSizeTolerance</span> determines the absolute best value then the percent difference of the other points to this value. In the case of RMSE, this would be</p>

<p><img width = 283 height = 52 src="tol.png"></p>


<p>
where <i>RMSE<sub>{opt}</sub></i> is the absolute best error rate. These "tolerance" values are plotted in the bottom panel. The solid triangle is the smallest subset size that is within 10% of the optimal value.
</p>

<p>
This approach can produce good results for many of the tree based models, such as random forest, where there is a plateau of good performance for larger subset sizes. For trees, this is usually because unimportant variables are infrequently used in splits and do not significantly affect performance.
</p>

<!--begin.rcode rfe_lmdens,tidy=FALSE
## Find the row with the absolute smallest RMSE
smallest <- pickSizeBest(example, metric = "RMSE", maximize = FALSE)
smallest

## Now one that is within 10% of the smallest
within10Pct <- pickSizeTolerance(example, metric = "RMSE", tol = 10, maximize = FALSE)
within10Pct

minRMSE <- min(example$RMSE)
example$Tolerance <- (example$RMSE - minRMSE)/minRMSE * 100   

## Plot the profile and the subsets selected using the 
## two different criteria

par(mfrow = c(2, 1), mar = c(3, 4, 1, 2))

plot(example$Variables[-c(smallest, within10Pct)], 
     example$RMSE[-c(smallest, within10Pct)],
     ylim = extendrange(example$RMSE),
     ylab = "RMSE", xlab = "Variables")

points(example$Variables[smallest], 
       example$RMSE[smallest], pch = 16, cex= 1.3)

points(example$Variables[within10Pct], 
       example$RMSE[within10Pct], pch = 17, cex= 1.3)
 
with(example, plot(Variables, Tolerance))
abline(h = 10, lty = 2, col = "darkgrey")
    end.rcode--> 

<b><p><i>The <span class="mx funCall">selectVar</span> Function</p></i></b>

<p>
After the optimal subset size is determined, this function will be used to calculate the best rankings for each variable across all the resampling iterations (line 2.16). Inputs for the function are:
</p>

<ul>
<li> <span class="mx arg">y</span>: a list of variables importance for each resampling iteration and each subset size (generated by the user-defined </li> <span class="mx funCall">rank</span> function). In the example, each each of the cross-validation groups the output of the <span class="mx funCall">rank</span> function is saved for each of the <!--rinline I(length(subsets)+1 ) --> subset sizes (including the original subset). If the rankings are not recomputed at each iteration, the values will be the same within each cross-validation iteration.</li> 
<li> <span class="mx arg">size</span>: the integer returned by the  <span class="mx funCall">selectSize</span> function</li> 
</ul>

<p>
This function should return a character string of predictor names (of length <code>size</code>) in the order of most important to least important
</p>

<p>
For random forests, only the first importance calculation (line 2.5) is used since these are the rankings on the full set of predictors. These importances are averaged and the top predictors are returned.
</p>

<!--begin.rcode rfe_selectVar,
rfRFE$selectVar
    end.rcode--> 

<p>
Note that if the predictor rankings are recomputed at each iteration (line 2.11) the user will need to write their own selection function to use the other ranks.
</p>

<h2>The Example</h2>

<p>
For random forest, we fit the same series of model sizes as the linear model. The option to save all the resampling results across subset sizes was changed for this model and are used to show the lattice plot function capabilities in the figures below.
</p>

<!--begin.rcode rfe_rf,cache=TRUE
ctrl$functions <- rfRFE
ctrl$returnResamp <- "all"
set.seed(10)
rfProfile <- rfe(x, y, sizes = subsets, rfeControl = ctrl)
rfProfile
    end.rcode--> 

<p>The resampling profile can be visualized along with plots of the individual resampling results:</p>

<!--begin.rcode rfe_rf_plot1,tidy=FALSE
trellis.par.set(caretTheme())
plot1 <- plot(rfProfile, type = c("g", "o"))
plot2 <- plot(rfProfile, type = c("g", "o"), metric = "Rsquared")
print(plot1, split=c(1,1,1,2), more=TRUE)
print(plot2, split=c(1,2,1,2))
    end.rcode--> 


<!--begin.rcode rfe_rf_plot2,tidy=FALSE
trellis.par.set(theme1)
plot1 <- xyplot(rfProfile, 
                type = c("g", "p", "smooth"), 
                ylab = "RMSE CV Estimates")
plot2 <- densityplot(rfProfile, 
                     subset = Variables < 5, 
                     adjust = 1.25, 
                     as.table = TRUE, 
                     xlab = "RMSE CV Estimates", 
                     pch = "|")
print(plot1, split=c(1,1,1,2), more=TRUE)
print(plot2, split=c(1,2,1,2))
    end.rcode--> 

<!-- ------------------------ end #content------------------------ -->

<div style="clear: both;">&nbsp;</div>
  </div>
  <!-- end #content -->
<div id="sidebar">
<ul>
  <li>
    <h2>General Topics</h2>
    <ul>
      <li><a href="index.html">Front Page</a></li>
      <li><a href="visualizations.html">Visualizations</a></li>
      <li><a href="preprocess.html">Pre-Processing</a><li>
      <li><a href="splitting.html">Data Splitting</a></li>
      <li><a href="varimp.html">Variable Importance</a></li>
      <li><a href="other.html">Model Performance</a></li>
      <li><a href="parallel.html">Parallel Processing</a></li>
    </ul>
    <h2>Model Training and Tuning</h2>
    <ul>
      <li><a href="training.html">Basic Syntax</a></li>
      <li><a href="modelList.html">Sortable Model List</a></li>
      <li><a href="bytag.html">Models By Tag</a></li>
      <li><a href="similarity.html">Models By Similarity</a></li>
      <li><a href="custom_models.html">Using Custom Models</a></li>
      <li><a href="sampling.html">Sampling for Class Imbalances</a></li> 
      <li><a href="random.html">Random Search</a></li> 
      <li><a href="adaptive.html">Adaptive Resampling</a></li> 
    </ul>
    <h2>Feature Selection</h2>
    <ul>
      <li><a href="featureselection.html">Overview</a>
      <li><a href="rfe.html">RFE</a></li>
      <li><a href="filters.html">Filters</a></li>
      <li><a href="GA.html">GA</a></li>
      <li><a href="SA.html">SA</a></li>
    </ul>  
  </li>
</ul>
</div>
<!-- end #sidebar -->
<div style="clear: both;">&nbsp;</div>
  </div>
  <div class="container"><img src="images/img03.png" width="1000" height="40" alt="" /></div>
  <!-- end #page -->
</div>
  <div id="footer-content"></div>
  <div id="footer">
<!--begin.rcode echo = FALSE
knit_hooks$set(inline = hook_inline)    
    end.rcode-->   
  <p>Created on <!--rinline I(session) -->.</p>
  </div>
  <!-- end #footer -->
</body>
  </html>

